Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fix for _RasterizeToPixels back to avoid NaNs #235

Merged
merged 1 commit into from
Jun 22, 2024

Conversation

opipari
Copy link
Contributor

@opipari opipari commented Jun 21, 2024

This is a minor fix to avoid NaN values resulting from the backward pass of _RasterizeToPixels when padding is applied to the color tensor. Specifically, when the color tensor's feature dimension (COLOR_DIM in kernel) is not supported by a templated kernel, padding is applied as a preprocessing step with torch.empty to add channels up to the nearest supported COLOR_DIM. In this case, those empty values will be interpreted as undefined values during the backward pass which may populate the gradients for means2d, conics, and opacities with NaN values. Once the gradients for those three tensors have NaNs, the NaN values can propagate to other tensors and gradients. A simple fix is proposed in this pull request by explicitly initializing the padded channels' values with zero.

At line 1081 in the corresponding backward kernel there's an iteration over the color dimensions for a summation which ultimately seems to be used to calculate the output gradient for means2d, conics, and opacities. At this point in the kernel, if any of the input color channels have a NaN, the summation's result is undefined. Thus, I don't see an obvious solution at the kernel-level without giving up register space, but initializing those padded channels with zeros explicitly solves the issue and based on my testing doesn't come with any performance drop.

I tried creating a minimal working example of this issue for reproducibility, but since the empty tensor values' "initialization" is dependent on the program's memory space at execution, the example isn't guaranteed to be reproducible on a different machine. I'd be happy to update this request to include an explicit dtype in the torch.zeros initialization to match color.dtype if that is preferred for style. Thanks for considering this request.

Thank you for this awesome project!

Copy link
Collaborator

@liruilong940607 liruilong940607 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey yes this makes a lot of sense! And i agree zero initialize is the correct fix! We had the same issue with the compensation which has to be initialized with zero instead of empty. Thanks for digging into this!

@liruilong940607 liruilong940607 merged commit 3b38568 into nerfstudio-project:main Jun 22, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants